Exploring big volume sensor data with Vroom

Comments:

ABSTRACT

State of the art sensors within a single autonomous vehicle (AV) can produce video and LIDAR data at rates greater than 30 GB/hour. Unsurprisingly, even small AV research teams can accumulate tens of terabytes of sensor data from multiple trips and multiple vehicles. AV practitioners would like to extract information about specic locations or specic situations for further study, but are often unable to. Queries over AV sensor data are dierent from generic analytics or spatial queries because they demand reasoning about elds of view as well as heavy computation to extract features from scenes. In this article and demo we present Vroom, a system for ad-hoc queries over AV sensor databases. Vroom combines domain specic properties of AV datasets with selective indexing and multi-query optimization to address challenges posed by AV sensor data.

##Introduction

  • AV generate from high-resolution cameras, lidar and GPS at about 10 MBps.

    Queries:

  • Q1 Compute basic statistics on recent trips such as data
    rates by sensor and location coverage.
  • Q2 [building 3D maps] Retrieve all forward-facing video frames of the corner of Vassar and Main St. in Cambridge, MA., ordered clockwise.
  • Q3 [ preparing labeled training] Retrieve lidar and video readings for all cameras in the vehicle, for intervals when any vehicle camera frame shows a bicycle. Group the data by trip, and order it by timestamp within each trip.
  • Q4 [ preparing labeled training] Retrieve all sensor readings in the minute leading up to an interesting event, such as a possible near miss. e.g., where a vehicle’s CAN bus records a sudden brake or sharp steer, group the readings by trip and order them by timestamp within each trip.

Challenges

  • Computational intensity of UDFs: such as deep learning based classification
  • Big volumes: ad-hoc query on large historical data
  • Many features of interest:
  • Interface and storage issues:

Architecture

  • Sophisticated feature precomputation and indexing:
  • Synthesizing cheap predicates:
  • Memoizing:
  • Storage clustering, based on the workload
  • Multi-query optimization:
  • [to read] polystore data model

System

Share Comments